Unsupervised neural domain adaptation for document image binarization

نویسندگان

چکیده

Binarization is a well-known image processing task, whose objective to separate the foreground of an from background. One many tasks for which it useful that preprocessing document images in order identify relevant information, such as text or symbols. The wide variety types, alphabets, and formats makes binarization challenging. There are multiple proposals with solve this problem, classical manually-adjusted methods, more recent approaches based on machine learning. latter techniques require large amount training data obtain good results; however, labeling portion each existing collection documents not feasible practice. This common problem supervised learning, can be addressed by using so-called Domain Adaptation (DA) techniques. These take advantage knowledge learned one domain, labeled available, apply other domains there no data. paper proposes method combines neural networks DA carry out unsupervised binarization. However, when both source target very similar, adaptation could detrimental. Our methodology, therefore, first measures similarity between innovative manner determine whether appropriate process. results reported experimentation, evaluating up 20 possible combinations among five different domains, show our proposal successfully deals new without need

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Unsupervised Domain Adaptation for Image Classification via Low Rank Representation Learning

Domain adaptation is a powerful technique given a wide amount of labeled data from similar attributes in different domains. In real-world applications, there is a huge number of data but almost more of them are unlabeled. It is effective in image classification where it is expensive and time-consuming to obtain adequate label data. We propose a novel method named DALRRL, which consists of deep ...

متن کامل

Sample-oriented Domain Adaptation for Image Classification

Image processing is a method to perform some operations on an image, in order to get an enhanced image or to extract some useful information from it. The conventional image processing algorithms cannot perform well in scenarios where the training images (source domain) that are used to learn the model have a different distribution with test images (target domain). Also, many real world applicat...

متن کامل

Document Image Binarization

Principal stage of the document image analysis procedure is the binarization, according to which the pixels are classified into text and background. It is a crucial stage that can affect further stages including the final character recognition stage. This thesis is focused on document image binarization, including both binarization techniques and evaluation methodologies. Specifically, accordin...

متن کامل

Adaptive document image binarization

A new method is presented for adaptive document image binarization, where the page is considered as a collection of subcomponents such as text, background and picture. The problems caused by noise, illumination and many source type-related degradations are addressed. Two new algorithms are applied to determine a local threshold for each pixel. The performance evaluation of the algorithm utilize...

متن کامل

Binarization of Document Image

Documents Image Binarization is performed in the preprocessing stage for document analysis and it aims to segment the foreground text from the document background. A fast and accurate document image binarization technique is important for the ensuing document image processing tasks such as optical character recognition (OCR). Though document image binarization has been studied for many years, t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Pattern Recognition

سال: 2021

ISSN: ['1873-5142', '0031-3203']

DOI: https://doi.org/10.1016/j.patcog.2021.108099